Telco Customer Churn

Telco Customer Churn

In this article, we analyze and predict customer churn for Telco Customer Churn data.

Dataset

Columns Description
customerID Customer ID
gender Whether the customer is a male or a female
SeniorCitizen Whether the customer is a senior citizen or not (1, 0)
Partner Whether the customer has a partner or not (Yes, No)
Dependents Whether the customer has dependents or not (Yes, No)
tenure Number of months the customer has stayed with the company
PhoneService Whether the customer has a phone service or not (Yes, No)
MultipleLines Whether the customer has multiple lines or not (Yes, No, No phone service)
InternetService Customer’s internet service provider (DSL, Fiber optic, No)
OnlineSecurity Whether the customer has online security or not (Yes, No, No internet service)
OnlineBackup Whether the customer has an online backup or not (Yes, No, No internet service)
DeviceProtection Whether the customer has device protection or not (Yes, No, No internet service)
TechSupport Whether the customer has tech support or not (Yes, No, No internet service)
StreamingTV Whether the customer has streaming TV or not (Yes, No, No internet service)
StreamingMovies Whether the customer has streaming movies or not (Yes, No, No internet service)
Contract The contract term of the customer (Month-to-month, One year, Two years)
PaperlessBilling Whether the customer has paperless billing or not (Yes, No)
PaymentMethod The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
MonthlyCharges The amount charged to the customer monthly
TotalCharges The total amount charged to the customer
Churn Whether the customer churned or not (Yes or No)

Preprocessing

Integer Columns

Float Columns

Object Columns

Yes/No Columns

We can convert all Yes/No columns using as follows

\begin{cases} 0 &\mbox{No}\\ 1 &\mbox{Yes}\end{cases}

Internet Services

Some other columns can be converted similarly; however, we need to create a new feature.

These Columns can be coded as follows

$$\mbox{InternetServiceType} = \begin{cases} 0 &\mbox{No internet service} \\ 1 &\mbox{No}\\ 2 &\mbox{Yes}\end{cases}$$

Moreover, note that,

This Column can be coded as follows

$$\mbox{InternetServiceType} = \begin{cases} 0 &\mbox{No} \\ 1 &\mbox{DSL}\\ 2 &\mbox{Fiber optic}\end{cases}$$

Phone Services

Since, there is already a feature as Phone Service, for Multiple Lines, we can try $$ \mbox{MultipleLines} = \begin{cases} 0 &\mbox{No, No phone service}\\ 1 &\mbox{Yes}\end{cases} $$

Remaining Columns

Contract

\begin{cases} 0 &\mbox{Month-to-month}\\ 1 &\mbox{One year}\\ 2 &\mbox{Two year} \end{cases}

Gender

$$ \mbox{Gender} = \begin{cases} 0 &\mbox{Female}\\ 1 &\mbox{Male}\end{cases} $$

Payment Method

In this case, we can not rank these values. Therefore,

Features with high variance

Moreover, high variance for some features can hurt our modeling process. For this reason, we would like to standardize features by removing the mean and scaling to unit variance.

Feature Correlation


Saving to a CSV


References

  1. Kaggle Dataset: Telco Customer Churn